Sains Malaysiana 54(1)(2025): 279-290

http://doi.org/10.17576/jsm-2025-5401-22

 

Optimizing Tuberculosis Treatment Predictions: A Comparative Study of XGBoost with Hyperparameter in Penang, Malaysia

(Mengoptimumkan Peramalan Rawatan Tuberkulosis: Suatu Kajian Perbandingan XGBoost dengan Hiperparameter di Penang, Malaysia)

 

YANIZA SHAIRA ZAKARIA1, NUR AFIQAH ARIFFIN2,*, AZIZUL AHMAD3, RUSLAN RAINIS2, AIDY M. MUSLIM1 & WAN MOHD MUHIYUDDIN WAN IBRAHIM2

 

1Institute of Oceanography and Environment (INOS), Universiti Malaysia Terengganu, 21030 Kuala Nerus, Terengganu, Malaysia
2Geography Section, School of Humanities, Universiti Sains Malaysia (USM), 11800 Pulau Pinang, Malaysia
3Centre for Spatially Integrated Digital Humanities (CSIDH), Faculty of Social Sciences & Humanities, Universiti Malaysia Sarawak, 94300 Kota Samarahan, Sarawak, Malaysia

 

Diserahkan: 24 April 2024/Diterima: 4 November 2024

 

Abstract

The bacterium Mycobacterium tuberculosis causes a viral infection affecting the lungs and liver. Tuberculosis (TB) is a significant public health concern in developing countries, where it is often associated with poverty, poor living conditions, and limited access to healthcare services. According to the World Health Organization (2023), Tuberculosis continues to pose a substantial risk to public health on a global scale, with millions of people affected each year and around 1.5 million deaths in 2020. Healthcare providers often encounter significant challenges in addressing TB, leading to uncertain treatment outcomes. This study introduces a novel method for enhancing TB treatment using sophisticated machine learning techniques, particularly emphasizing the application of XGBoost and various predictive models in Penang State, Malaysia, to predict individual treatment outcomes based on clinical data. The models were trained using 2017 Penang data. Comparing predicted accuracy helps establish the optimum method. Clinical data was anonymized and analyzed. Decision tree accuracy is 63.7% using 2017 data. Logistic Regression is 63.3% accurate, while XGBoost is 66.3%. Hyperparameter-tuned XGBoost performs best at 68.1%. Comparing observed and expected results determines accuracy. TB result predictions are accurate using supervised learning. Calibrated ensemble models like XGBoost makes reliable predictions. Additional clinical characteristics may improve forecasts. The primary objective was to develop a reliable, clinically validated instrument that enhances TB treatments while optimizing resource efficiency across diverse healthcare environments.

 

Keywords: Classification; hyperparameter; logistic regression; prediction; random forest; tuberculosis  

 

Abstrak

Bakteria Mycobacterium tuberculosis menyebabkan jangkitan virus yang menjejaskan paru-paru dan hati. Tuberkulosis (TB) adalah kebimbangan kesihatan awam yang signifikan di negara-negara membangun dan sering dikaitkan dengan kemiskinan, keadaan hidup yang buruk dan akses terhad kepada perkhidmatan kesihatan. Menurut Pertubuhan Kesihatan Sedunia (2023), TB terus menimbulkan risiko yang besar kepada kesihatan awam di peringkat global dengan berjuta-juta orang terjejas setiap tahun dan sekitar 1.5 juta kematian pada tahun 2020. Penyediaan penjagaan kesihatan sering menghadapi cabaran besar dalam menangani TB, yang membawa kepada hasil rawatan yang tidak menentu. Kajian ini memperkenalkan kaedah baharu untuk meningkatkan rawatan TB menggunakan teknik pembelajaran mesin yang canggih dengan penekanan khusus kepada aplikasi XGBoost dan pelbagai model ramalan di Pulau Pinang, Malaysia untuk meramalkan hasil rawatan individu berdasarkan data klinikal. Model-model tersebut dilatih menggunakan data Penang tahun 2017. Membandingkan ketepatan ramalan membantu menetapkan kaedah optimum. Data klinikal telah dianonimkan dan dianalisis. Ketepatan pokok keputusan adalah 63.7% menggunakan data 2017. Regresi Logistik adalah tepat 63.3%, manakala XGBoost adalah 66.3%. XGBoost yang diselaraskan dengan hiperparameter berprestasi terbaik pada 68.1%. Membandingkan hasil yang diperhatikan dan yang dijangkakan menentukan ketepatan. Ramalan keputusan TB adalah tepat menggunakan pembelajaran terawasi. Himpunan model yang dikalibrasi seperti XGBoost memberikan ramalan yang boleh dipercayai. Ciri klinikal tambahan mungkin dapat meningkatkan ramalan. Objektif utama adalah untuk membangunkan instrumen yang boleh dipercayai dan disahkan secara klinikal yang meningkatkan rawatan TB sambil mengoptimumkan kecekapan sumber pada pelbagai persekitaran penjagaan kesihatan.

 

Kata kunci: Hiperparameter; hutan rawak; pengelasan; ramalan; regresi logistik; Tuberkulosis

 

RUJUKAN

Abdullahi, O.A., Ngari, M.M., Sanga, D., Katana, G. & Willetts, A. 2019. Mortality during treatment for tuberculosis; a review of surveillance data in a rural county in Kenya. PLoS ONE 14(7): e0219191. https://doi.org/10.1371/journal.pone.0219191

Ahmad, A., Kelana, M.H., Soda, R., Jubit, N., Mohd Ali, A.S., Bismelah, L.H. & Masron, T. 2024a. Mapping the impact: Property crime trends in Kuching, Sarawak, during and after the COVID-19 period (2020-2022). Indonesian Journal of Geography 56(1): 127-137. https://doi.org/10.22146/ijg.90057

Ahmad, A., Masron, T., Jubit, N., Redzuan, M.S., Soda, R., Bismelah, L.H. & Mohd Ali, A.S. 2024b. Analysis of the movement distribution pattern of violence crime in Malaysia’s capital region-Selangor, Kuala Lumpur, and Putrajaya. International Journal of Geoinformatics 20(2): 11-26. https://doi.org/10.52939/ijg.v20i2.3061

Ahmad, A., Masron, T., Junaini, S.N., Barawi, M.H., Redzuan, M.S., Kimura, Y., Jubit, N., Bismelah, L.H. & Mohd Ali, A.S. 2024c. Criminological insights: A comprehensive spatial analysis of crime hot spots of property offenses in Malaysia’s urban centers. Forum Geografi: Indonesian Journal of Spatial and Regional Analysis 38(1): 94-109. https://doi.org/10.23917/forgeo.v38i1.4306

Ahmad, A., Masron, T., Junaini, S.N., Kimura, Y., Barawi, M.H., Jubit, N., Redzuan, M.S., Bismelah, L.H. & Mohd Ali, A.S. 2024d. Mapping the unseen: Dissecting property crime dynamics in urban Malaysia through spatial analysis. Transactions in GIS 28(6): 1486-1509. https://doi.org/10.1111/tgis.13197

Ahmad, A., Masron, T., Kimura, Y., Barawi, M.H., Jubit, N., Junaini, S.N., Redzuan, M.S., Mohd Ali, A.S. & Bismelah, L.H. 2024e. Unveiling urban violence crime in the state of The Selangor, Kuala Lumpur and Putrajaya: A spatial–temporal investigation of violence crime in Malaysia’s key cities. Cogent Social Sciences 10(1): 2347411. https://doi.org/10.1080/23311886.2024.2347411

Ahmad, A., Masron, T., Mohd Ali, A.S., Barawi, M.H., Nordin, Z.S., Abg Ahmad, A.I., Redzuan, M.S. & Bismelah, L.H. 2024f. Exploring the potential of geographic information system (GIS) application for understanding spatial distribution of violent crime related to United Nations sustainable development goals-16 (SDGS-16). Journal of Sustainability Science and Management 19(9): 35-63. https://doi.org/10.46754/jssm.2024.09.003

Ahmad, A., Masron, T., Mohd Ali, A.S., Kimura, Y. & Junaini, S.N. 2024g. Demographic dynamics and urban property crime: A linear regression analysis in Kuala Lumpur and Putrajaya (2015-2020). Planning Malaysia: Journal of the Malaysian Institute of Planners 22(4): 302-319. https://doi.org/10.21837/pm.v22i33.1550

Ahmad, A., Masron, T., Ringkai, E., Barawi, M.H., Salleh, M.S., Jubit, N. & Redzuan, M.S. 2024h. Analisis ruangan hot spot jenayah pecah rumah di negeri Selangor, Kuala Lumpur dan Putrajaya pada tahun 2015-2020. Geografia-Malaysian Journal of Society and Space 20(1): 49-67. https://doi.org/10.17576/geo-2024-2001-04

Ali, A., Alrubei, M.A.T., Hassan, L.F.M., Al-Ja’afari, M.A.M. & Abdulwahed, S.H. 2020. Diabetes diagnosis based on KNN. IIUM Engineering Journal 21(1): 175-181. https://doi.org/10.31436/iiumej.v21i1.1206

Ariffin, N.A., Wan Ibrahim, W.M.M., Rainis, R., Samat, N., Mohd Nasir, M.I., Abdul Rashid, S.M.R., Ahmad, A. & Zakaria, Y.S. 2024. Identification of trends, direction of distribution and spatial pattern of tuberculosis disease (2015-2017) in Penang. Geografia-Malaysian Journal of Society and Space 20(1): 68-84. https://doi.org/10.17576/geo-2024-2001-05

Bismelah, L.H., Masron, T., Ahmad, A., Mohd Ali, A.S. & Echoh, D.U. 2024. Geospatial assessment of healthcare distribution and population density in Sri Aman, Sarawak, Malaysia. Geografia-Malaysian Journal of Society and Space 20(3): 51-67. https://doi.org/10.17576/geo-2024-2003-04

Bukundi, E.M., Mhimbira, F., Kishimba, R., Kondo, Z. & Moshiro, C. 2021. Mortality and associated factors among adult patients on tuberculosis treatment in Tanzania: A retrospective cohort study. Journal of Clinical Tuberculosis and Other Mycobacterial Diseases 24: 100263. https://doi.org/10.1016/j.jctube.2021.100263

Chabo, D., Masron, T., Jubit, N. & Ahmad, A. 2024. Analisis corak ruangan keciciran murid sekolah menengah di Sarawak. Malaysian Journal of Social Sciences and Humanities 9(9): e002906. https://doi.org/10.47405/mjssh.v9i9.2906

Dheda, K., Perumal, T., Moultrie, H., Perumal, R., Esmail, A., Scott, A.J., Udwadia, Z., Chang, K.C., Peter, J., Pooran, A., von Delft, A., von Delft, D., Martinson, N., Loveday, M., Charalambous, S., Kachingwe, E., Jassat, W., Cohen, C., Tempia, S., Fennelly, K. & Pai, M. 2022. The intersecting pandemics of tuberculosis and COVID-19: Population-level and patient-level impact, clinical presentation, and corrective interventions. The Lancet Respiratory Medicine 10(6): 603-622.
https://doi.org/10.1016/S2213-2600(22)00092-3

Fayaz, S.A., Babu, L., Paridayal, L., Vasantha, M., Paramasivam, P., Sundarakumar, K. & Ponnuraja, C. 2024. Machine learning algorithms to predict treatment success for patients with pulmonary tuberculosis. PLoS ONE 19(10): e0309151–e0309151. https://doi.org/10.1371/journal.pone.0309151

Gichuhi, H.W., Magumba, M., Kumar, M. & Mayega, R.W. 2023. A Machine Learning approach to explore individual risk factors for tuberculosis treatment non-adherence in Mukono district. PLOS Glob Public Health 3(7): e0001466. https://doi.org/10.1371/journal.pgph.0001466

Gill, C.M., Dolan, L., Piggott, L.M. & McLaughlin, A.M. 2022. New Developments in Tuberculosis Diagnosis and Treatment. Breathe, 18(1): 210149. https://doi.org/10.1183/20734735.0149-2021

Hrizi, O., Gasmi, K., Ben Ltaifa, I., Alshammari, H., Karamti, H., Krichen, M., Ben Ammar, L. & Mahmood, M.A. 2022. Tuberculosis disease diagnosis based on an optimized Machine Learning model. Journal of Healthcare Engineering 2022: 8950243. https://doi.org/10.1155/2022/8950243

Hussain, O. A., & Junejo, K.N. 2018. Predicting treatment outcome of drug-susceptible tuberculosis patients using machine-learning models. Informatics for Health and Social Care 44(2): 135–151. https://doi.org/10.1080/17538157.2018.1433676

Janssens, R.J., Mourão-Miranda, J. & Schnack, H.G. 2018. Making individual prognoses in psychiatry using neuroimaging and Machine Learning. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging 3(9): 798-808. https://doi.org/10.1016/j.bpsc.2018.04.004

Jubit, N., Masron, T., Ahmad, A. & Soda, R. 2024a. Investigating the spatial relation between landuse and property crime in Kuching, Sarawak through location quotient analysis. Forum Geografi: Indonesian Journal of Spatial and Regional Analysis 38(2): 153-166. https://doi.org/10.23917/forgeo.v38i2.4575

Jubit, N., Masron, T., Redzuan, M.S., Ahmad, A. & Kimura, Y. 2024b. Revealing adolescent drug trafficking and addiction: Exploring school disciplinary and drug issues in the Federal Territory of Kuala Lumpur and Selangor, Malaysia. International Journal of Geoinformatics 20(6): 1-12. https://doi.org/10.52939/ijg.v20i6.3327

Jubit, N., Masron, T., Puyok, A. & Ahmad, A. 2023. Geographic distribution of voter turnout, ethnic turnout and vote choices in Johor state election. Geografia-Malaysian Journal of Society and Space 19(4): 64-76. https://doi.org/10.17576/geo-2023-1904-05

Kouchaki, S., Yang, Y., Walker, T.M., Sarah Walker, A., Wilson, D.J., Peto, T.E.A., Crook, D.W., CRyPTIC Consortium & Clifton, D.A. 2019. Application of Machine Learning techniques to tuberculosis drug resistance analysis. Bioinformatics 35(13): 2276-2282. https://doi.org/10.1093/bioinformatics/bty949

Lopez-Garnier, S., Sheen, P. & Zimic, M. 2019. Automatic diagnostics of tuberculosis using convolutional neural networks analysis of MODS digital images. PLoS ONE 14(2): e0212094. https://doi.org/10.1371/journal.pone.0212094

Marzuki, A., Bagheri, M., Ahmad, A., Masron, T. & Akhir, M.F. 2024. Examining transformations in coastal city landscapes: Spatial patch analysis of sustainable tourism - A case study in Pahang, Malaysia. Landscape and Ecological Engineering 20: 513-545. https://doi.org/10.1007/s11355-024-00613-w

Marzuki, A., Bagheri, M., Ahmad, A., Masron, T. & Akhir, M.F. 2023. Establishing a GIS-SMCDA model of sustainable eco-tourism development in Pahang, Malaysia. Episodes 46(3): 375-387. https://doi.org/10.18814/epiiugs/2022/022037

Masron, T., Ahmad, A., Jubit, N., Sulaiman, M.H., Rainis, R., Redzuan, M.S., Junaini, S.N., Jamian, M.A.H., Mohd Ali, A.S., Salleh, M.S., Zaini, F., Soda, R. & Kimura, Y. 2024. Crime Map Book. Centre for Spatially Integrated Digital Humanities (CSIDH), Faculty of Social Sciences and Humanities, Universiti Malaysia Sarawak. https://www.researchgate.net/publication/384572873_Crime_Map_Book

Miotto, R., Li, L., Kidd, B.A. & Dudley, J.T. 2016. Deep patient: An unsupervised representation to predict patients’ future from the electronic health records. Scientific Reports 6: 26094. https://doi.org/10.1038/srep26094

Nicholson, T.J., Hoddinott, G., Seddon, J.A., Claassens, M.M., van der Zalm, M.M., Lopez, E., Bock, P., Caldwell, J., Da Costa, D., de Vaal, C., Dunbar, R., Du Preez, K., Hesseling, A.C., Joseph, K., Kriel, E., Loveday, M., Marx, F.M., Meehan, S.A., Purchase, S., Naidoo, K., Naidoo, L., Solomon-Da, C.F., Sloot, R., Osman, M. 2023b. A systematic review of risk factors for mortality among tuberculosis patients in South Africa. A Systematic Review 12(1): 23. https://doi.org/10.1186/s13643-023-02175-8

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Cournapeau, D., Brucher, M., Perrrot, M. & Duchesnay, E. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12: 2825-2830. https://dl.acm.org/doi/10.5555/1953048.2078195

Takarinda, K.C., Sandy, C., Masuka, N., Hazangwe, P., Choto, R.C., Mutasa-Apollo, T., Nkomo, B., Sibanda, E., Mugurungi, O., Harries, A.D. & Siziba, N. 2017. Factors associated with mortality among patients on TB treatment in the Southern Region of Zimbabwe, 2013. Tuberculosis Research and Treatment 2017: 6232071. https://doi.org/10.1155/2017/6232071

Tiwari, A. & Maji, S. 2019. Machine Learning techniques for tuberculosis prediction. International Conference on Advances in Engineering Science Management & Technology (ICAESMT) - 2019, Uttaranchal University, Dehradun, India. https://ssrn.com/abstract=3404486 or http://dx.doi.org/10.2139/ssrn.3404486

World Health Organization. 2023. Tuberculosis. World Health Organization. https://www.who.int/news-room/fact-sheets/detail/tuberculosis

World Health Organisation. 2022. Global Tuberculosis Report 2022. https://www.who.int/teams/global-tuberculosis-programme/tb-reports/global-tuberculosis-report-2022

Xie, Y., Han, J., Yu, W., Wu, J., Li, X. & Chen, H. 2020. Survival analysis of risk factors for mortality in a cohort of patients with tuberculosis. Canadian Respiratory Journal 2020: 1654653. https://doi.org/10.1155/2020/1654653

Xiong, Y., Ba, X., Hou, A., Zhang, K., Chen, L. & Li, T. 2018. Automatic detection of mycobacterium tuberculosis using artificial intelligence. Journal of Thoracic Disease 10(3): 1936–1940. https://doi.org/10.21037/jtd.2018.01.91

Yang, S., Zhu, F., Ling, X., Liu, Q. & Zhao, P. 2021. Intelligent health care: Applications of deep learning in computational medicine. Frontiers in Genetics https://doi.org/10.3389/fgene.2021.607471

Zakaria, Y.S., Ahmad, A., Said, M.Z., Epa, A.E., Ariffin, N.A., M Muslim, A., Akhir, M.F. & Hussin, R. 2023. GIS and oil spill tracking model in forecasting potential oil spill-affected areas along Terengganu and Pahang coastal area. Planning Malaysia: Journal of the Malaysian Institute of Planners 21(4): 250-264. https://doi.org/10.21837/pm.v21i28.1330

 

*Pengarang untuk surat-menyurat; email: zarika27@gmail.com

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

   

sebelumnya